Engineering a Failure Detection Service for Widely Distributed Systems
نویسندگان
چکیده
Unreliable failure detectors are recognized as important building blocks for implementing fault-tolerant distributed systems. Further, there has been a lot of discussion on how to provide them with sophisticated features that allow for adaptation, flexible use, scalability and quality of service enforcement. Despite that, we are not aware of any real distributed system that uses a sophisticated failure detection service. In fact, most systems deployed use the trivial failure detection scheme provided by the underlying communication technologies (e.g. TCP/IP timeouts). We believe that this state of affairs is due to two main reasons: i) there is no widely supported failure detection service API that incorporates these advanced features in a suitable way; and ii) the benefits of using a sophisticated failure detection service are not clearly understood. This paper targets the first issue by proposing a failure detection service that addresses the main necessities of widely distributed systems and implements the state-of-the-art in failure detection mechanisms. Moreover, to improve the usability of the service we took special care in the design of its programming interface.
منابع مشابه
Neural Network Based Protection of Software Defined Network Controller against Distributed Denial of Service Attacks
Software Defined Network (SDN) is a new architecture for network management and its main concept is centralizing network management in the network control level that has an overview of the network and determines the forwarding rules for switches and routers (the data level). Although this centralized control is the main advantage of SDN, it is also a single point of failure. If this main contro...
متن کاملA Novel Passive Method for Islanding Detection in Microgrids
Integration of distributed generations (DGs) in power grids is expected to play an essential role in the infrastructure and market of electrical power systems. Microgrids are small energy systems, capable of balancing captive supply and requesting resources to retain stable service within a specific boundary. Microgrids can operate in grid-connected or islanding modes. Effective islanding detec...
متن کاملOn the Design of a Failure Detection Service for Large-Scale Distributed Systems
It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. There are however several issues that must be addressed before such a service can actually be implemented. In this paper, we highlight the main issues related to ensuring failure detection in large-scale systems, and overview the main solutions proposed in the lit...
متن کاملCold standby redundancy optimization for nonrepairable series-parallel systems: Erlang time to failure distribution
In modeling a cold standby redundancy allocation problem (RAP) with imperfect switching mechanism, deriving a closed form version of a system reliability is too difficult. A convenient lower bound on system reliability is proposed and this approximation is widely used as a part of objective function for a system reliability maximization problem in the literature. Considering this lower bound do...
متن کاملRadial Basis Neural Network Based Islanding Detection in Distributed Generation
This article presents a Radial Basis Neural Network (RBNN) based islanding detection technique. Islanding detection and prevention is a mandatory requirement for grid-connected distributed generation (DG) systems. Several methods based on passive and active detection scheme have been proposed. While passive schemes have a large non detection zone (NDZ), concern has been raised on active method ...
متن کامل